331 research outputs found
Preventive Maintenance of a Two-Unit Standby Redundant System with a Good State
A preventive maintenance policy is proposed for a two-unit standby redundant system, each of which has good, degraded and failed states. The maen time to first system down is derived by the theory of Semi-Markov process. Further, the condition under which the policy is effective is obtained
PHASE DIFFERENCE BETWEEN FORWARD SWINGS OF
In previous observations of the lower extremity movement in long distance running, a seemingly ineffective action, was noticed. It was observed that the upper leg swung to its most forward angle once, then returned backward before the lower leg swung to its most forward angle, prior to the foot touching the floor (Kawai & Hiki, 1999). In this study, the relationship between the phase difference and the amount of work around the hip joint was investigated, by using a computer simulation, in order to determine the meaning of the phase difference. From the results, it was reasoned that the upper leg swung forward in the advanced phase before the lower leg. This was in order to decelerate the mass of the lower leg smoothly, by raising the knee and to minimize the change of work for decelerating the whole leg before the foot touching the floor
CrossMap Transformer: A Crossmodal Masked Path Transformer Using Double Back-Translation for Vision-and-Language Navigation
Navigation guided by natural language instructions is particularly suitable
for Domestic Service Robots that interacts naturally with users. This task
involves the prediction of a sequence of actions that leads to a specified
destination given a natural language navigation instruction. The task thus
requires the understanding of instructions, such as ``Walk out of the bathroom
and wait on the stairs that are on the right''. The Visual and Language
Navigation remains challenging, notably because it requires the exploration of
the environment and at the accurate following of a path specified by the
instructions to model the relationship between language and vision. To address
this, we propose the CrossMap Transformer network, which encodes the linguistic
and visual features to sequentially generate a path. The CrossMap transformer
is tied to a Transformer-based speaker that generates navigation instructions.
The two networks share common latent features, for mutual enhancement through a
double back translation model: Generated paths are translated into instructions
while generated instructions are translated into path The experimental results
show the benefits of our approach in terms of instruction understanding and
instruction generation.Comment: 8 pages, 5 figures, 5 tables. Submitted to IEEE Robotics and
Automation Letter
Transducer-based language embedding for spoken language identification
The acoustic and linguistic features are important cues for the spoken
language identification (LID) task. Recent advanced LID systems mainly use
acoustic features that lack the usage of explicit linguistic feature encoding.
In this paper, we propose a novel transducer-based language embedding approach
for LID tasks by integrating an RNN transducer model into a language embedding
framework. Benefiting from the advantages of the RNN transducer's linguistic
representation capability, the proposed method can exploit both
phonetically-aware acoustic features and explicit linguistic features for LID
tasks. Experiments were carried out on the large-scale multilingual LibriSpeech
and VoxLingua107 datasets. Experimental results showed the proposed method
significantly improves the performance on LID tasks with 12% to 59% and 16% to
24% relative improvement on in-domain and cross-domain datasets, respectively.Comment: This paper was submitted to Interspeech 202
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR
Due to the modality discrepancy between textual and acoustic modeling,
efficiently transferring linguistic knowledge from a pretrained language model
(PLM) to acoustic encoding for automatic speech recognition (ASR) still remains
a challenging task. In this study, we propose a cross-modality knowledge
transfer (CMKT) learning framework in a temporal connectionist temporal
classification (CTC) based ASR system where hierarchical acoustic alignments
with the linguistic representation are applied. Additionally, we propose the
use of Sinkhorn attention in cross-modality alignment process, where the
transformer attention is a special case of this Sinkhorn attention process. The
CMKT learning is supposed to compel the acoustic encoder to encode rich
linguistic knowledge for ASR. On the AISHELL-1 dataset, with CTC greedy
decoding for inference (without using any language model), we achieved
state-of-the-art performance with 3.64% and 3.94% character error rates (CERs)
for the development and test sets, which corresponding to relative improvements
of 34.18% and 34.88% compared to the baseline CTC-ASR system, respectively.Comment: Submitted to ICASSP 202
- …